Approximated Provenance for Complex Applications
نویسندگان
چکیده
Many applications now involve the collection of large amounts of data from multiple users, and then aggregating and manipulating it in intricate ways. The complexity of such applications, combined with the size of the collected data, makes it difficult to understand how information was derived, and consequently difficult to asses its credibility, to optimize and debug its derivation, etc. Provenance has been helpful in achieving such goals in different contexts, and we illustrate its potential for novel complex applications such as those performing crowd-sourcing. Maintaining (and presenting) the full and exact provenance information may be infeasible for such applications, due to the size of the provenance and its complex structure. We propose some initial directions towards addressing this challenge, through the notion of approximated provenance.
منابع مشابه
PROX: Approximated Summarization of Data Provenance
Many modern applications involve collecting large amounts of data from multiple sources, and then aggregating and manipulating it in intricate ways. The complexity of such applications, combined with the size of the collected data, makes it difficult to understand the application logic and how information was derived. Data provenance has been proven helpful in this respect in different contexts...
متن کاملModelling Provenance Collection Points and Their Impact on Provenance Graphs
As many domains employ ever more complex systems-of-systems, capturing provenance among component systems is increasingly important. Applications such as intrusion detection, load balancing, traffic routing, and insider threat detection all involve monitoring and analyzing the data provenance. Implicit in these applications is the assumption that “good” provenance is captured (e.g. complete pro...
متن کاملA Process-Driven Approach to Provenance-Enabling Existing Applications
Currently, there are no general provenance management systems or tools available for existing applications. Groups that do not have the resources or expertise to build the provenance infrastructure needed resort to the manual creation and maintenance of this information, greatly hindering their ability to do large-scale and/or complex data exploration and processing. Even with the resources, ap...
متن کاملRetrofitting Applications with Provenance-Based Security Monitoring
Data provenance is a valuable tool for detecting and preventing cyber attack, providing insight into the nature of suspicious events. For example, an administrator can use provenance to identify the perpetrator of a data leak, track an attacker’s actions following an intrusion, or even control the flow of outbound data within an organization. Unfortunately, providing relevant data provenance fo...
متن کاملEnabling Provenance on Large Scale e-Science Applications
Large-scale e-Science experiments present unprecedented data handling requirements with their multi-petabyte data storages. Complex software applications, such as the ATLAS High Energy Physics experiment at CERN, run throughout Grid computing sites around the world in a distributed environment, with scientists performing concurrent analysis on data and producing new data products shared among t...
متن کامل